A Preclinical Systematic Review and Meta-Analysis of Behavior Testing in Mice Models of Ischemic Stroke

Stroke remains one of the most important causes of death and disability. Preclinical research is a powerful tool for understanding the molecular and cellular response to stroke. However, a lack of standardization in animal evaluation does not always ensure reproducible results. In the present study, we wanted to identify the best strategy for evaluating animal behavior post-experimental stroke. As such, a meta-analysis was made, evaluating behavioral tests done on male C57BL/6 mice subjected to stroke or sham surgery. Overall, fifty-six studies were included. Our results suggest that different types of tests should be used depending on the post-stroke period one needs to analyze. In the hyper-acute, post-stroke period, the best quantifier will be animal examination scoring, as it is a fast and inexpensive way to identify differences between groups. When evaluating stoke mice in the acute phase, a mix of animal examination and motor tests that focus on movement asymmetry (foot-fault and cylinder testing) seem to have the best chance of picking up differences between groups. Complex tasks (the rotarod test and Morris water maze) should be used within the chronic phase to evaluate differences between the late-subacute and chronic phases.


Introduction
According to the Centers for Disease Control and Prevention, one in six deaths caused by cardiovascular disease is due to stroke [1]. With limited treatment options and multiple risk factors leading to new and/or recurrent strokes, it is essential to fully understand the complex molecular and cellular pathophysiology of cerebral ischemia and its long-term effects. Among the common risk factors of stroke are hypertension, obesity, diabetes, air pollution, smoking, an unbalanced diet, cholesterol, renal dysfunction, alcohol, and a sedentary lifestyle [2], but hematological disorders are the most frequent etiologies of ischemic stroke of unusual cause [3]. Human pathophysiology, prognosis, and clinical characteristics of acute small-vessel ischemic strokes are different from those of other types of cerebral infarcts; an essential line of research in the future would be the assessment of experimental small-vessel ischemic stroke; unfortunately, optimal animal models of lacunar strokes, mimicking the same underlying mechanisms, are lacking at the moment [4]. Although the clinical setting provides first-hand observations, animal experiments with rodents are one of the most commonly used models of disease. A rodent model provides advantages that  Table S1); and (5) were published in an Open Source (OS) format. We selected studies that used various methods of stroke induction: the intraluminal monofilament model and ligation/cauterization of the middle cerebral artery (MCA), CCA, ECA, and/or MCA ligaturation; stroke induced by photothrombosis; stroke induced by endothelin-1; and stroke induced by electrocauterization. Studies that (1) used transgenic animals and did not include WT controls, (2) used a modified neurological scale, (3) were abstracts, or posters or (4) were not published in English, were excluded from the analysis (Figure 1).

Selection of Studies and Data Extraction
The first selection was based on title and abstract, after which full texts were reviewed. For the current analysis, the inclusion and exclusion criteria were based on a recommendation made by PRISMA [8]. As such, we have included studies that: (1) used C57BL/6 mice as experimental animals; (2) did behavioral testing and/or standard neurological scales; (3) involved stroke; (4) presented comparative data between sham, control, and other molecules/cells, or procedures (Supplementary Table S1); and (5) were published in an Open Source (OS) format. We selected studies that used various methods of stroke induction: the intraluminal monofilament model and ligation/cauterization of the middle cerebral artery (MCA), CCA, ECA, and/or MCA ligaturation; stroke induced by photothrombosis; stroke induced by endothelin-1; and stroke induced by electrocauterization. Studies that (1) used transgenic animals and did not include WT controls, (2) used a modified neurological scale, (3) were abstracts, or posters or (4) were not published in English, were excluded from the analysis (Figure 1).  The included studies were extracted and summarized independently by two of the authors (IKS-B, ADR-Z). Data were obtained by reviewing all the included studies, and information regarding the methodology of each study was assessed according to Table 1. If data were not directly presented in the body of the article, they were extracted from graphs or figures using WebPlotDigitizer (Rohatgi A., Pacifica, CA, USA). Any disagreements in data extraction were resolved through discussions with a third reviewer (BC) until a consensus was reached.

Quality Assessment
The quality of the analyzed articles was measured using a modified scoring system that was based on the guidelines for preclinical tests [64]. These are the following: (1) the use of permanent middle cerebral artery occlusion (MCAo) models; (2) randomization of the experiment; (3) monitoring of physiological parameters (temperature, blood pressure, blood glucose level); (4) the tests were performed in a blinded manner; (5) assessment of at least two outcome parameters; (6) outcome was assessed the first 3 days post-stroke; (7) outcome was assessed beyond day 7 post-stroke; (8) if an appropriate animal model was used (aged, diabetic, hypertensive); (9) if there was a standalone statement in the article regarding compliance with animal welfare regulations; and (10) if a statement of potential conflict of interests was also present. Each item was considered one point. Studies that received 0-3 points were classified as class III, studies between 4 and 7 were classified as class II, and studies above 8 were classified as class I.

Risk of Bias Assessment
The risk of bias was calculated for all included studies using the RoB 2 Excel Marco Form Manual (Beta Version 7), which is structured into a fixed set of domains of bias focusing on different aspects of trial design, conduct, and reporting. Within each domain, a series of questions (called "signaling questions") aimed at retrieving information regarding features relevant to the risk of bias are employed. After this protocol, RoB 2 determines if the trial has a "Low" or "High" risk of bias. The algorithm can also express "Some concerns" after the questions are answered. This tool has seven standard domains: random sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessment, incomplete outcome data (attrition bias), selective reporting (reporting bias), and other biases. The risk of bias/study quality of the included studies was assessed independently by two of the authors (IKS-B and ADR-Z). Any discrepancies were resolved through discussions with a third reviewer (BC) until a consensus was reached.

Statistical Analysis
The meta-analysis was performed with RevMan 5.4.1 (The Cochrane Collaboration, 2020, London, UK). The differences between stroke and sham were pooled from mean differences with 95% Confidence Intervals (CI) using the random or fixed effects model, depending on the heterogeneity between studies that must be considered. Heterogeneity assessment was performed using the Chi-squared test and the I 2 statistic [65][66][67][68][69]. Animal studies are often more heterogeneous with respect to size, design, and intervention protocols, and a random effects model was used in most of the analysis. A value for a p-value less than 0.05 was considered statistically significant. The results for each test were classified according to the phase of stroke as acute (<7 days post-stroke), early subacute (<14 days post-stroke), late sub-acute (14 to 21 days post-stroke), and chronic stroke (>21 days post-stroke) [70].

Risk of Bias and Quality Assessment
Using the initial keyword search, 29,943 articles were identified through database searching. Prior to screening, a total of 28,375 articles were removed using automation tools or because they were duplicates. After title and abstract reading, 463 records were excluded before the full-text screening: 123 were abstracts/posters, 5 studies were withdrawn, 3 were written in Chinese, and 321 had no behavior tests. Only 1105 records were screened, of which 153 studies were done on rats, 412 used only transgenic mice without any wild-type (WT) controls, 3 used pigs as the animal model, 371 did not have WT controls or sham groups, 63 were using modified neurological scales, 15 were investigating neonatal stroke, 4 were human studies, and 1 used a monkey. In 27 articles, data was discussed but not shown. Only 56 studies 71] were included in this meta-analysis ( Figure 1).
For the 56 remaining articles, the quality assessment was determined. For two of the 10 items (5 and 9), all articles had a high quality (Figure 2A). In contrast, in terms of item 8, every article had an unclear quality. Regarding item 1, 19.64% of the articles were of high quality, and regarding item 2, 37.5% of the articles were of high quality. For item 4, 48.21% of the articles had a high quality. For the remaining items: 3 (80.33%), 6 (76.78%), 7 (87.5%), and 10 (87.5%), the articles had a high quality. The majority of the included studies came from Asia, followed by the USA and Europe ( Figure 2B). The median quality score of included studies was 6 (range, [3][4][5][6][7][8], and 75% of the articles belong to the second class ( Figure 2C). Arranging forest plots by quality score did not reveal a relationship between study quality and the effect of treatment. studies came from Asia, followed by the USA and Europe ( Figure 2B). The median quality score of included studies was 6 (range, [3][4][5][6][7][8], and 75% of the articles belong to the second class ( Figure 2C). Arranging forest plots by quality score did not reveal a relationship between study quality and the effect of treatment. of the articles were evaluated as having a low bias risk, 41.1% were found to have some bias concerns, and 23.2% had a high risk of bias.
The overall risks of bias for the included studies (Supplementary Table S7) were that 35.7% of the articles were evaluated as having a low bias risk, 41.1% were found to have some bias concerns, and 23.2% had a high risk of bias ( Figure 2D). Selection of the reporting result, missing outcome data, and bias arising from period and carryover effects were evaluated as having low risk of bias for all investigated articles. The biggest concern was the measurement of the outcome domain, where 32.1% of the articles had a high risk of bias and 14.3% had some concerns. The remaining 56.6% were scored as having a low risk The overall risks of bias for the included studies (Supplementary Table S7) were that 35.7% of the articles were evaluated as having a low bias risk, 41.1% were found to have some bias concerns, and 23.2% had a high risk of bias ( Figure 2D). Selection of the reporting result, missing outcome data, and bias arising from period and carryover effects were evaluated as having low risk of bias for all investigated articles. The biggest concern was the measurement of the outcome domain, where 32.1% of the articles had a high risk of bias and 14.3% had some concerns. The remaining 56.6% were scored as having a low risk of bias in this domain. Most studies (66.1%) were found to have a low risk of bias when evaluating the deviations from the intended intervention domain, while the remaining 33.9% had some concerns. The biggest concern about bias was in the randomization process domain, where only 44.6% of the articles had a low risk of bias while 55.4% were found to have some concerns. None of the included studies scored a low risk of bias in this domain.

Animal Examination and Some Motor Tasks Are Effective in Establishing Differences in the Hyper-Acute Post-Stroke Interval
Although most experimental stroke research papers included some sort of hyper-acute phase animal examination scoring, the nature of this scoring was not well defined in the majority of cases. We were able to identify and investigate the potential of three different scores (Supplementary Table S2). All animal examination scores did not report differences between Sham and MCAo animals before stroke; however, in the hyper-acute and acute post-stroke periods, some differences were seen. Within 24 h post-stroke, both the Garcia and Clark scoring systems can be used in order to distinguish between groups. The Garcia score, although extensively used, displayed moderate power in distinguishing Shams from MCAo animals (mean difference-MD = −7.71 with a 95% confidence interval (CI) of −14.67 to −0.75, p = 0.03), compared to Clark (MD = 9.76 with a 95% CI of 9.16 to 10.36, p < 0.00001). At the end of the hyper-acute post-stroke interval, Longa scoring ( Figure 3C) was able to distinguish between groups (MD = 2.23 with a 95% CI of 0.52 to 3.9, p = 0.01).
Life 2023, 13, x FOR PEER REVIEW 9 of 24 33.9% had some concerns. The biggest concern about bias was in the randomization process domain, where only 44.6% of the articles had a low risk of bias while 55.4% were found to have some concerns. None of the included studies scored a low risk of bias in this domain.

Animal Examination and Some Motor Tasks Are Effective in Establishing Differences in the Hyper-Acute Post-Stroke Interval
Although most experimental stroke research papers included some sort of hyperacute phase animal examination scoring, the nature of this scoring was not well defined in the majority of cases. We were able to identify and investigate the potential of three different scores (Supplementary Table S2). All animal examination scores did not report differences between Sham and MCAo animals before stroke; however, in the hyper-acute and acute post-stroke periods, some differences were seen. Within 24 h post-stroke, both the Garcia and Clark scoring systems can be used in order to distinguish between groups. The Garcia score, although extensively used, displayed moderate power in distinguishing Shams from MCAo animals (mean difference-MD = −7.71 with a 95% confidence interval (CI) of −14.67 to −0.75, p = 0.03), compared to Clark (MD = 9.76 with a 95% CI of 9.16 to 10.36, p < 0.00001). At the end of the hyper-acute post-stroke interval, Longa scoring (Figure 3C) was able to distinguish between groups (MD = 2.23 with a 95% CI of 0.52 to 3.9, p = 0.01).   [13,18,57] scoring was also able to distinguish between groups (MD = 2.23 with a 95% CI of 0.52 to 3.93, p = 0.01).

Motor Tests and Some Animal Examination Scoring Are Effective in Establishing Differences in the Acute and Early Sub-Acute Post-Stroke Intervals
We found some studies using animal examination scoring to evaluate acute changes in MCAo animals compared to Shams. From these papers we were able to identify the Clark neurological scale (MD = 8.57 with a 95% CI of 7.90 to 9.25, p < 0.00001) as the better option if animal examination scores are needed at such a late time-point. While no data was found for Garcia at this time-point, the only two articles applying the original Longa score found, reported no differences 7 days post-stroke (MD = 1.49 with a 95% CI of −1.31 to 4.30, p = 0.30) (Supplementary Table S8).
The evaluation of motor tasks in acute (<7 days post-stroke) and early sub-acute (<14 days post-stroke) intervals, identified cylinder, foot-fault, and rotarod as tests frequently used within this period. While the rotarod test showed consistent differences between the groups at 4, 5 (Supplementary Table S8), 7 and 14-days post-stroke ( Figure 5A,B), the analyzed studies using the cylinder test were also able to show differences at 7 days

Discussion
With over 12 million cases and 6.5 million deaths worldwide, stroke remains a major health concern [72]. As such, considerable efforts have been made to understand that it involves both prevention and treatment. Despite the fact that animal studies have identified several strategies for stroke improvement, there is a lack of translation from preclinical to clinical trials. Although several attempts have been made to investigate the efficacy of behavior testing in animal models of white matter injury [73] and rodent models of stroke [74], some even measuring the validity and reliability of neurological scores in mice [75][76][77][78][79][80][81][82][83][84][85][86][87][88][89][90][91][92], all have generated conflicting results.
In the present meta-analysis, we wanted to investigate what the best strategy was for evaluating animal behavior after an experimental stroke. We started this research due to increasing concerns regarding the reduction of the number of animals in preclinical studies [93,94]. While we agree with ethical and animal welfare concerns, there is still a need for accurate and reproducible research, especially in the stroke, where translational Life 2023, 13, 567 14 of 23 data is almost non-existent. One very fast way to ensure the lowest control number of animals for one experiment is to calculate the needed "N" for the experiment starting from a given average and standard deviation. This can be calculated using different statistical powers, which may vary depending on the experimental design [95]. In theory, by using standardized tests, the results can be easily validated and the need for an increased number of individual controls can be lowered. Although some attempts have been made to standardize the behavior testing in mice [96], inter-lab variability, inter-investigator variability, and even inter-animal variability do not always ensure that the "N" generates a good enough outcome for reproducible research. By using meta-data, this variability could be minimized.
Animal models are one of the most commonly used methods in preclinical research. Within the animal models available, the use of mouse MCAo, usually done on C57BL/6 male animals, is the most common, so we focused our research on studies using male C57BL/6 mice. The use of male C57BL/6 mice was historically justified by the fact that those female animals are affected by estrogen hormone concentrations and may increase the variability regarding behavioral testing. However, with CNS diseases affecting all individuals, a strong push for the inclusion of female animals in preclinical studies is starting to gain ground [97], as new reports cannot find the difference in behavior testing between genders. However, it should be noted that female mice have approximately 20% greater exercise endurance and are able to run approximately 54% more than their male counterparts [98].
We also focused exclusively on MCAo, as the model is almost synonymous with experimental stroke. We found that 73.21% of articles used the monofilament procedure to induce MCAo. One key aspect of this surgery is that it does not require craniotomies but rather produces a stroke by blocking a large cerebral artery, similar to a human stroke. The most common occlusion times found were 60, 90, and 120 min. This part of the model is extremely important, as occlusion time is directly proportional to brain tissue damage [99]. For example, the difference between 15 min and 30 min of occlusion represents an approximately five-fold increase in infarct area in C57BL/6 animals [100]. When infarct size increases, it involves larger damage to the cerebral hemisphere, including most of the ipsilateral cortex, corpus striatum, thalamus, hippocampus, piriform cortex, accumbens, and subventricular zone [101,102]. In contrast, a short MCAo (30 min) generates rapid infarction of the striatum and delayed infarction in the overlying cortex, associated with heat shock protein induction and immediate early gene induction in the cortex [103,104]. Longer and permanent MCAo are widespread and involve both the striatum and cortex, as well as much of the ipsilateral cerebral hemisphere and a small region of the penumbral cortex [105][106][107]. In our case, most articles (24) had an ischemia time of 60 min followed by reperfusion; 12 articles had an ischemia time of less than 60 min, while 7 articles had an ischemia time of more than 90 min. Likewise, in some articles, the authors used some methods of inducing a permanent stroke as follows: 5 articles used photothrombosis, 3 articles used electrocauterization, 2 articles used the methods of CCA ligation and electrocauterization of MCA, 1 article used administration of endothelin-1, another article used ligaturation of CCA, and another article used permanent MCAo. As such, our data shows the behavior results of mice with a longer occlusion periods and may not be suitable for shorter occlusion periods. Our analysis showed that animal examination scoring is reliable in detecting differences between Sham and MCAo mice immediately after stroke, making it an unexpansive and fast method to evaluate differences between groups (Figure 3). Within this acute period, both foot-fault and rotarod tests showed differences in motor tasks between groups ( Figure 4A). The rotarod test was able to show differences at 24 and 48 h after surgery, but at 72 h, no statistical differences were observed ( Figure 4B-D).
One of the most surprising assets of the present study was the large variation of the applied protocols for each test. This is reflected in the low number of articles found that follow the original animal examination scoring. For example, from 56 articles using neurological scales, only 13 were taken into account for this analysis because only they were using the original score. The present meta-analysis was based on 56 studies with a median quality of 6 out of 10 ( Figure 2C), higher than previous investigations [108][109][110], and a small percentage (23.2%) of the articles had a high risk of bias due to the fact that the animal behavior assessment was performed in a nonblinded manner ( Figure 2D). We identified three neurological scales commonly used for stroke studies. The first one, Garcia, is a neurological scale that highlights sensory and motor function as well as body symmetry in mice. It has the advantage that it is easy to use and makes a comprehensive assessment. Limited hind limb assessment and unreliable long-term follow-up are the main disadvantages. While other studies consider it appropriate for use for up to 7 days (14), our data shows that the original Garcia neurological score was applied to evaluate animals only in the hyperacute post-stroke period (Figure 3). According to the articles included in the present research, Longa scoring was not able to distinguish between groups at any of the time points investigated (Supplementary Table S8). Our analysis shows that Clark neurological scoring is a better solution for a 7-day post-stroke evaluation (MD = 8.57 with a 95% CI of 7.90 to 9.25, p < 0.0001) (Figure 7).
Within this acute period, both foot-fault and rotarod tests showed differences in m tasks between groups ( Figure 4A). The rotarod test was able to show differences at 24 48 h after surgery, but at 72 h, no statistical differences were observed ( Figure 4B-D) One of the most surprising assets of the present study was the large variation o applied protocols for each test. This is reflected in the low number of articles found follow the original animal examination scoring. For example, from 56 articles using rological scales, only 13 were taken into account for this analysis because only they w using the original score. The present meta-analysis was based on 56 studies with a me quality of 6 out of 10 ( Figure 2C), higher than previous investigations [108][109][110], a small percentage (23.2%) of the articles had a high risk of bias due to the fact tha animal behavior assessment was performed in a nonblinded manner ( Figure 2D). identified three neurological scales commonly used for stroke studies. The first one, cia, is a neurological scale that highlights sensory and motor function as well as b symmetry in mice. It has the advantage that it is easy to use and makes a comprehen assessment. Limited hind limb assessment and unreliable long-term follow-up are main disadvantages. While other studies consider it appropriate for use for up to 7 (14), our data shows that the original Garcia neurological score was applied to eval animals only in the hyperacute post-stroke period (Figure 3). According to the art included in the present research, Longa scoring was not able to distinguish betw groups at any of the time points investigated (Supplementary Table S8). Our ana shows that Clark neurological scoring is a better solution for a 7-day post-stroke eva tion (MD = 8.57 with a 95% CI of 7.90 to 9.25, p < 0.0001) (Figure 7). Regarding animal examination scoring, we agree with the previous work that c pared the efficacy of different neurological scales (Garcia, Longa, and Modo) in rat st models [111]. Although we can recommend, for hyper-acute post-stroke periods, Garcia and Clark scoring (Figure 3), since the meta-analysis only included male C57 Regarding animal examination scoring, we agree with the previous work that compared the efficacy of different neurological scales (Garcia, Longa, and Modo) in rat stroke models [111]. Although we can recommend, for hyper-acute post-stroke periods, both Garcia and Clark scoring (Figure 3), since the meta-analysis only included male C57BL/6 mice without any other comorbidities such as age, diabetes, or hypertension, for studies that also include such animals, our results should be validated. There is also the possibility that some groups did not publish all the results between Shams and MCAo due to publication bias regarding negative or neutral results [7,112].
According to our results, in evaluating the acute and early sub-acute post-stroke periods, one should focus on motor tasks rather than animal examination scores. In our opinion, although motor tests such as foot-fault, cylinder test, or rotarod can be used, they have different efficiencies. This is because some studies did not report any differences between the sham and MCAo groups. This uncertainty can also be caused by the small difference in the case of foot-fault at 14 days post-stroke (p = 0.006) in the meta-analysis. As such, a larger number of animals used or even the volume of stroke elicited by each individual doing the surgery could tilt the balance one way or another. Based on the data found, it will be better if one focuses on quantifying differences in limb coordination, in which foot-fault is superior to the cylinder test. The foot-fault test is considered objective, highly effective, and capable of evaluating long-term outcomes (up to 90 days) in ischemic stroke [96]. The present work partially confirms these results, as our data shows foot-fault to be effective up to 28 days post-stroke ( Figure 6), making this test sensitive in detecting both acute and chronic motor coordination deficits after ischemic stroke.
However, for chronic evaluation, the present work shows that the rotarod test should be used. Rotarod is one of the most used tests in rodent stroke models. Here, we showed that although it can generate differences in the acute phases of stroke, our data shows that at day 3 post-stroke, the test cannot differentiate between sham and MCAo animals ( Figure 4); therefore, we will not recommend it in this interval. Outside of the acute phase, it is highly sensitive. Although some research suggests it can be effective up to 6 weeks after stroke [91], our data can only partially confirm it. This is because at the 17th and 18th day post-stroke time points, we were not able to get a difference but did at all other investigated time points (Supplementary Table S8).
One of the most surprising data that came out of our literature search is that there is a high degree of variation in the Morris water maze test. As such, we had few direct comparisons of data at different time points between studies (at 24 h, 23 days, 55 days, and 56 days). The value of this test is clear; it is one of the most used tests to highlight long-term cognitive impairment and motor deficits after stroke in rodent models of stroke. Previous work reported that stroke mice show an increased latency in finding the platform at 2, 4, and 6 weeks [91]. Due to differences between implementation protocols, a large number of articles were excluded from the current meta-analysis (165 articles). Although the remaining papers largely used the same MCAo-inducing protocols, there was not a perfect overlap in the evaluation. Even so, we found some conflicting results. For example, when applied at days 16 and 19 post-stroke, some studies found that the Morris water maze was able to detect differences between Sham and MCAo mice; however, no differences were observed at 17-and 18-days post-MCAo (Supplementary Table S8). We have summarized ( Table 2) the behavioral tests, their usefulness in highlighting the various deficiencies caused by stroke, the perfect time (time window) to perform them, but also their advantages and disadvantages. Adding to the many existing protocols, the inter-animal heterogeneity [91] largely means that results from different studies are difficult to directly compare, and it is our opinion that each lab should establish its own standard for this test.

Strengths and Limitations
One of the strengths of this study is that we looked at a homogenous group of animals, with all included animals being C57BL/6 mice that were subjected to the same testing protocol and were subjected to a middle cerebral artery occlusion protocol. We can identify some weaknesses in our meta-analysis. For example, none of the articles used in the present meta-analysis looked at aged, diabetic, or hypertensive animals. These comorbidities could, for example, increase the impact of animal examination scoring in animals after stroke versus healthy animals. The included studies have different ischemic periods (30 to 120 min before reperfusion), generating different infarct locations and volumes, which may affect the results of the behavior results. We cannot exclude the possibility that some studies did not publish the results between Shams and MCAO due to publication bias attributable to not reporting negative or neutral studies [112].
Although this study is an overview and the quality appraisal is optional, the quality of the articles has been evaluated, which is one of the strong points of the study. In addition, we conducted this meta-analysis based on the PRISMA guidelines, and all the steps of this study were done by two independent reviewers, which reduced errors and increased the power of the study. There are also potential limitations to this study. First, a limitation of the study is that the literature search was conducted in five major electronic databases: PubMed, Web of Science, Science Direct, EMBASE, and Cochrane Reviews; no other databases were searched, as was the "gray" literature. Another limitation is that we only included open access publications, and due to this fact, additional relevant studies might have been missed. Second, we included only studies written in English, and we did not make any correlation between the tests used and the treatments, but we summarized the used therapies in the Supplementary Table S1. Third, we excluded articles published in preprint databases due to a lack of peer review.

Conclusions
With stroke being one of the most important causes of death and disability, the need for better treatment strategies is increasing. Preclinical research is a powerful tool in our understanding of the molecular and cellular response to stroke; however, a more standardized evaluation of the animal's post-stroke will ensure reproducible results. Our results show that for hyperacute and acute post-stroke evaluation, animal examination scoring, especially Clark and Garcia, should be used, as it also has the advantage of being easy to use and effective. In order to evaluate differences between acute and subacute periods, the tests used should be based on motor tasks. We found rotarod and cylinder tests to be reliable in this interval, but their use in the chronic evaluation should be carefully considered as the results of testing in this period are highly variable, depending on a plethora of factors regarding inter-individual variation, surgical differences, age, and comorbidities of the animals used.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/life13020567/s1, Table S1: Molecules, cells and procedures used in the included articles;

Data Availability Statement:
The data that support the findings of this study are available from the corresponding author, B.C., upon reasonable request.